Skip to content

fix: add process-level credential cache#2272

Merged
aknysh merged 1 commit intocloudposse:mainfrom
9spokes:fix/speedup-describe-affected
Apr 2, 2026
Merged

fix: add process-level credential cache#2272
aknysh merged 1 commit intocloudposse:mainfrom
9spokes:fix/speedup-describe-affected

Conversation

@AleksandrMatveev
Copy link
Copy Markdown
Contributor

@AleksandrMatveev AleksandrMatveev commented Apr 1, 2026

what

  • Added a process-level in-memory credential cache (sync.Map) to authenticateChain() that stores successfully authenticated credentials keyed by realm + chain identity
  • When a subsequent authentication request matches the same realm and chain within the same CLI invocation, the cached credentials are returned (after expiration validation) without making additional API calls
  • The previous fix (dbcba35) that skips cached target identity credentials in keyring/file storage remains intact - this cache layer sits above it

why

  • The previous fix correctly prevented stale cached credentials from being returned by always forcing re-authentication of the target identity (e.g., AssumeRole). However, during atmos describe affected, each !terraform.state YAML function resolution creates a new AuthManager via resolveAuthManagerForNestedComponent, and each one triggers a full AssumeRole API call
  • This caused atmos describe affected to degrade from ~2 minutes to ~17 minutes due to N redundant STS AssumeRole calls for N components sharing the same auth chain
  • The in-memory cache is inherently safe: unlike keyring/file caches that persist across processes and may contain stale data from different auth mechanisms (e.g., pod credentials vs. role credentials), process-scoped credentials were authenticated during the current invocation and are guaranteed correct
  • Cached entries are validated against the existing expiration buffer (minCredentialValidityBuffer = 15m) before reuse, and the cache resets naturally when the process exits

references

Summary by CodeRabbit

  • New Features

    • Process-level in-memory credential cache to speed repeated authentications and share valid credentials across instances.
    • Automatic detection and removal of expired/invalid cached credentials with transparent re-authentication.
    • Ability to reset the process credential cache.
  • Tests

    • Tests verifying cache hits, isolation by realm/chain, expiration handling, and cache reset behavior.

@AleksandrMatveev AleksandrMatveev requested a review from a team as a code owner April 1, 2026 11:27
@github-actions github-actions bot added the size/m Medium size PR label Apr 1, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Adds a process-level in-memory credential cache keyed by realm and authentication chain. authenticateChain checks the cache first and validates entries, falls back to the prior bottom-up authentication on miss/stale entry, stores successful results, and adds internal helpers to compute/reset cache keys for tests.

Changes

Cohort / File(s) Summary
Credential cache implementation
pkg/auth/manager_chain.go
Introduces a process-level processCredentialCache keyed by realm + auth chain (chainCacheKey), fast-path cache lookup & isCredentialValid check in authenticateChain; on miss/expired falls back to existing bottom-up authenticateFromIndex flow; stores successful credentials; adds unexported resetProcessCredentialCache and helper logic.
Credential cache tests
pkg/auth/manager_chain_process_cache_test.go
New tests exercising process-wide caching: verifies cross-manager cache hits for identical realm+chain, distinct-chain isolation, expiry-triggered re-authentication, realm isolation, and resetProcessCredentialCache clears the global cache. Adds countingIdentity test stub to assert Authenticate invocation counts.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client
participant Manager
participant ProcessCache as "Process Cache"
participant Identity as "Identity/AuthProvider"
participant Realm as "Realm (m.realm)"
Client->>Manager: authenticateChain(request)
Manager->>Realm: compute realm value
Manager->>ProcessCache: lookup key(realm + chain)
alt cache hit
ProcessCache-->>Manager: cached credentials
Manager->>Manager: isCredentialValid(credentials)
alt valid
Manager-->>Client: return cached credentials
else invalid/expired
Manager->>ProcessCache: delete stale entry
Manager->>Identity: authenticateFromIndex(...) (bottom-up)
Identity-->>Manager: credentials
Manager->>ProcessCache: store credentials
Manager-->>Client: return credentials
end
else cache miss
Manager->>Identity: authenticateFromIndex(...) (bottom-up)
Identity-->>Manager: credentials
Manager->>ProcessCache: store credentials
Manager-->>Client: return credentials
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

minor

Suggested reviewers

  • aknysh
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: add process-level credential cache' directly matches the PR's core change: implementing an in-memory credential cache at the process level to avoid duplicate authentication calls.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/auth/manager_chain.go (1)

55-87: ⚠️ Potential issue | 🟠 Major

Concurrent misses still fan out authentication.

This is race-safe, but not duplicate-safe. Two goroutines can both miss on Line 56, both run authenticateFromIndex, and both Store on Line 85. Since describe affected reaches auth from parallel workers, the first wave can still issue duplicate AssumeRole calls for the same realm+chain. Gate the miss path with per-key singleflight/in-flight coordination, then lock it down with a same-key concurrent test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/auth/manager_chain.go` around lines 55 - 87, The current cache miss path
(using chainCacheKey + processCredentialCache) is race-safe but allows duplicate
authentication because concurrent goroutines can both miss at the initial Load
and both call authenticateFromIndex, causing duplicate AssumeRole calls; fix by
adding per-key in-flight coordination (e.g., use singleflight.Group or a per-key
mutex map) around the miss path: when Load misses, enter the singleflight call
keyed by chainCacheKey, inside the singleflight function re-check
processCredentialCache.Load (or call findFirstValidCachedCredentials) to avoid a
lost-race, then call authenticateFromIndex only once and Store the resulting
processCachedCreds to processCredentialCache before returning; ensure the public
path around cache load -> singleflight -> re-check -> authenticateFromIndex ->
Store uses chainCacheKey and references processCredentialCache,
authenticateFromIndex, findFirstValidCachedCredentials, processCachedCreds and
m.chainCacheKey.
🧹 Nitpick comments (1)
pkg/auth/manager_chain.go (1)

36-42: Keep the reset hook out of the production API.

This helper exists only for test isolation, and the new tests already live in package auth, so an unexported resetProcessCredentialCache() would work here. If you ever need it from external-package tests, export_test.go keeps that surface test-only instead of making it part of the runtime package API.

Based on learnings: pkg/utils now exposes ResetPathMatchCache() and ResetGlobMatchesCache() (export_test.go).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/auth/manager_chain.go` around lines 36 - 42, The exported
ResetProcessCredentialCache helper is test-only and should be hidden from the
production API: rename the function ResetProcessCredentialCache to an unexported
resetProcessCredentialCache and keep its logic operating on
processCredentialCache.Range/Delete; if external-package tests need access, move
or duplicate an exported wrapper into an export_test.go test-only file instead
of keeping this in the runtime package API.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/auth/manager_chain.go`:
- Around line 45-47: The current chainCacheKey (manager.chainCacheKey) builds a
key by concatenating m.realm.Value and strings.Join(m.chain, "->"), which can
collide when values contain ":" or "->"; replace this with an unambiguous
encoding (e.g., JSON-encode a small struct {Realm string; Chain []string} or
base64 of a deterministic encoding) so realm and chain elements are uniquely
separated; update chainCacheKey to return that encoded string (and handle any
encoding error deterministically) instead of the current simple concatenation.

---

Outside diff comments:
In `@pkg/auth/manager_chain.go`:
- Around line 55-87: The current cache miss path (using chainCacheKey +
processCredentialCache) is race-safe but allows duplicate authentication because
concurrent goroutines can both miss at the initial Load and both call
authenticateFromIndex, causing duplicate AssumeRole calls; fix by adding per-key
in-flight coordination (e.g., use singleflight.Group or a per-key mutex map)
around the miss path: when Load misses, enter the singleflight call keyed by
chainCacheKey, inside the singleflight function re-check
processCredentialCache.Load (or call findFirstValidCachedCredentials) to avoid a
lost-race, then call authenticateFromIndex only once and Store the resulting
processCachedCreds to processCredentialCache before returning; ensure the public
path around cache load -> singleflight -> re-check -> authenticateFromIndex ->
Store uses chainCacheKey and references processCredentialCache,
authenticateFromIndex, findFirstValidCachedCredentials, processCachedCreds and
m.chainCacheKey.

---

Nitpick comments:
In `@pkg/auth/manager_chain.go`:
- Around line 36-42: The exported ResetProcessCredentialCache helper is
test-only and should be hidden from the production API: rename the function
ResetProcessCredentialCache to an unexported resetProcessCredentialCache and
keep its logic operating on processCredentialCache.Range/Delete; if
external-package tests need access, move or duplicate an exported wrapper into
an export_test.go test-only file instead of keeping this in the runtime package
API.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a309404-eef4-40f2-97f7-3ed46b5a5711

📥 Commits

Reviewing files that changed from the base of the PR and between 2c0a3dd and 3bad528.

📒 Files selected for processing (2)
  • pkg/auth/manager_chain.go
  • pkg/auth/manager_chain_process_cache_test.go

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 1, 2026
@AleksandrMatveev AleksandrMatveev force-pushed the fix/speedup-describe-affected branch 2 times, most recently from 721bc26 to 6bafe3f Compare April 1, 2026 21:30
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/auth/manager_chain.go (1)

51-89: Add perf.Track to authenticateChain().

The cache hit/miss flow looks good. I’d still add defer perf.Track(nil, "auth.Manager.authenticateChain")() here so the latency win is measurable and this hot path stays aligned with the repo’s tracking convention.

♻️ Proposed change
 func (m *manager) authenticateChain(ctx context.Context, _ string) (types.ICredentials, error) {
+	defer perf.Track(nil, "auth.Manager.authenticateChain")()
+
 	// Fast path: check process-level in-memory cache.
 	// Credentials authenticated during this process are guaranteed correct, unlike
 	// keyring/file caches which may hold stale data from previous runs.

Based on learnings "performance tracking with defer perf.Track() is enforced on all functions via linting, including high-frequency utility functions, formatters, and renderers."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/auth/manager_chain.go` around lines 51 - 89, Add performance tracking to
authenticateChain by invoking perf.Track at function entry: in the method
manager.authenticateChain, immediately after the function begins, add a deferred
call using defer perf.Track(nil, "auth.Manager.authenticateChain")() so the
function's latency is recorded; ensure the perf package is imported if not
already. This keeps the hot path aligned with repo tracking conventions and
measures cache hit/miss latency for authenticateChain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/auth/manager_chain.go`:
- Around line 51-89: Add performance tracking to authenticateChain by invoking
perf.Track at function entry: in the method manager.authenticateChain,
immediately after the function begins, add a deferred call using defer
perf.Track(nil, "auth.Manager.authenticateChain")() so the function's latency is
recorded; ensure the perf package is imported if not already. This keeps the hot
path aligned with repo tracking conventions and measures cache hit/miss latency
for authenticateChain.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c20f5b48-166c-4131-86ce-45bbab3fe268

📥 Commits

Reviewing files that changed from the base of the PR and between 3bad528 and 6bafe3f.

📒 Files selected for processing (2)
  • pkg/auth/manager_chain.go
  • pkg/auth/manager_chain_process_cache_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/auth/manager_chain_process_cache_test.go

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 1, 2026
@aknysh aknysh added the patch A minor, backward compatible change label Apr 1, 2026
@autofix-ci
Copy link
Copy Markdown
Contributor

autofix-ci bot commented Apr 1, 2026

Hi! I'm autofix logoautofix.ci, a bot that automatically fixes trivial issues such as code formatting in pull requests.

I would like to apply some automated changes to this pull request, but it looks like I don't have the necessary permissions to do so. To get this pull request into a mergeable state, please do one of the following two things:

  1. Allow edits by maintainers for your pull request, and then re-trigger CI (for example by pushing a new commit).
  2. Manually fix the issues identified for your pull request (see the GitHub Actions output for details on what I would like to change).

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.20%. Comparing base (39a9be1) to head (1533c1e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2272      +/-   ##
==========================================
+ Coverage   77.16%   77.20%   +0.03%     
==========================================
  Files        1034     1034              
  Lines       97576    97597      +21     
==========================================
+ Hits        75296    75346      +50     
+ Misses      18073    18046      -27     
+ Partials     4207     4205       -2     
Flag Coverage Δ
unittests 77.20% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/auth/manager_chain.go 89.90% <100.00%> (+2.06%) ⬆️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AleksandrMatveev AleksandrMatveev force-pushed the fix/speedup-describe-affected branch 2 times, most recently from 65cbc13 to d9af71c Compare April 1, 2026 22:42
…le calls

The previous fix (dbcba35) correctly forces re-authentication of the target
identity to prevent stale cached credentials from being returned. However,
during commands like `atmos describe affected`, each `!terraform.state` YAML
function resolution creates a new AuthManager via resolveAuthManagerForNestedComponent,
and each one triggers a full AssumeRole API call since the target identity cache
is always skipped. This caused `describe affected` to take ~17 minutes instead of ~2.

Add a process-scoped in-memory sync.Map that caches authenticated credentials
keyed by realm + chain. Within a single CLI invocation, once a chain has been
authenticated, subsequent managers with the same chain reuse the result.
Cached entries are validated for expiration before use. Unlike keyring/file
caches that persist across processes and may contain stale data, the in-memory
cache is inherently fresh and resets when the process exits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@AleksandrMatveev AleksandrMatveev force-pushed the fix/speedup-describe-affected branch from d9af71c to 1533c1e Compare April 1, 2026 23:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/auth/manager_chain_process_cache_test.go (1)

47-242: Consider table-driving the scenario tests to reduce duplication.

These scenarios share setup shape and differ mainly by realm/chain/cache seed + expected auth call count; a table-driven harness will shrink repetition and make additions safer.

As per coding guidelines, "Use table-driven tests for testing multiple scenarios in Go."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/auth/manager_chain_process_cache_test.go` around lines 47 - 242, These
four near-duplicate tests should be combined into a single table-driven test
(e.g. TestProcessCredentialCache_TableDriven) that iterates over cases; for each
case call resetProcessCredentialCache(), optionally pre-seed
processCredentialCache with the case's seed key/value, construct a manager with
the case's chain/realm/identities/providers/credentialStore (reusing the
existing test helper types testProvider, countingIdentity, testCreds,
testStore), call m.authenticateChain(ctx, role), then assert the returned creds
and the identity.callCount matches the case's expected count; use
t.Run(case.name, func(t *testing.T){ ... }) to isolate cases and require/assert
as in the originals. Ensure you reference and reuse authenticateChain,
resetProcessCredentialCache, processCredentialCache, manager.chain,
manager.realm.Value, and countingIdentity.callCount to implement the checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/auth/manager_chain_process_cache_test.go`:
- Around line 17-46: Remove the manual countingIdentity test double and replace
it with a mock generated via go.uber.org/mock/mockgen: add a //go:generate
directive in the pkg/auth package referencing the types.Identity interface, run
mockgen to produce a mock (e.g., mock_types_identity.go with
mock.NewMockIdentity), then update tests that referenced countingIdentity
(including uses of Authenticate, PrepareEnvironment, GetProviderName, etc.) to
construct the generated mock, configure expectations/Return values for
Authenticate and other methods, and assert call counts via gomock's EXPECT()
instead of countingIdentity.callCount.

In `@pkg/auth/manager_chain.go`:
- Around line 56-58: The current unchecked type assertion
entry.(*processCachedCreds) can panic if a malformed value is in
processCredentialCache; change it to a safe assertion (e.g., cached, ok :=
entry.(*processCachedCreds)) and skip/evict the entry when ok is false before
calling m.isCredentialValid("process-cache", cached.credentials); ensure you
reference processCredentialCache, cacheKey, processCachedCreds, and
m.isCredentialValid when making the check so the code either continues safely or
clears the bad cache entry instead of panicking.

---

Nitpick comments:
In `@pkg/auth/manager_chain_process_cache_test.go`:
- Around line 47-242: These four near-duplicate tests should be combined into a
single table-driven test (e.g. TestProcessCredentialCache_TableDriven) that
iterates over cases; for each case call resetProcessCredentialCache(),
optionally pre-seed processCredentialCache with the case's seed key/value,
construct a manager with the case's
chain/realm/identities/providers/credentialStore (reusing the existing test
helper types testProvider, countingIdentity, testCreds, testStore), call
m.authenticateChain(ctx, role), then assert the returned creds and the
identity.callCount matches the case's expected count; use t.Run(case.name,
func(t *testing.T){ ... }) to isolate cases and require/assert as in the
originals. Ensure you reference and reuse authenticateChain,
resetProcessCredentialCache, processCredentialCache, manager.chain,
manager.realm.Value, and countingIdentity.callCount to implement the checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 41ab2ee5-4ddb-465d-aa28-c2f95557e129

📥 Commits

Reviewing files that changed from the base of the PR and between 6bafe3f and 1533c1e.

📒 Files selected for processing (2)
  • pkg/auth/manager_chain.go
  • pkg/auth/manager_chain_process_cache_test.go

@aknysh aknysh merged commit bded6f0 into cloudposse:main Apr 2, 2026
56 checks passed
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

These changes were released in v1.214.0-rc.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch A minor, backward compatible change size/m Medium size PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants